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ABSTRACT 


It is easy to argue that open data is critical to enabling faster and more effective research discovery. In this 
article, we describe the approach we have taken at Wiley to support open data and to start enabling more 
data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data 
policies: “Encourages”, “Expects”, “Mandates” and “Mandates and Peer Reviews Data”. We describe the 
rationale for these policies and levels of adoption so far. In the coming months we plan to measure and 
monitor the implementation of these policies via the publication of data availability statements and data 
citations. With this information, we’ll be able to celebrate adoption of data-sharing practices by the research 
communities we work with and serve, and we hope to showcase researchers from those communities leading 
in open research. 


1. BACKGROUND AND MOTIVATION 


“Open research” and “open science” are two interchangeable terms that encompass a number of 
practices that are becoming widely adopted [1,2]. While definitions of open research and open science 
come in many flavors (see Table 1), their core elements include open accessibility and dissemination of 
research outputs including more than traditional journal articles. 


t Corresponding author: Yan Wu (E-mail: ywu2 @wiley.com; ORCID: 0000-0001 -7610-9465). 


202211.00464v1 


chinaXiv 


ChinaXiva ERAF 
Paving the Way to Open Data 


Table 1. Definitions of open research and open science. 


Attribution Definition 


Foster Open Science “Open Science is the practice of science in such a way that others can collaborate 
and contribute, where research data, lab notes and other research processes are 
freely available, under terms that enable reuse, redistribution and reproduction of the 
research and its underlying data and methods.” [3] 

European Commission “A broad term, covering the many exciting developments in how science is 
becoming more open, accessible, efficient, democratic, and transparent. This Open 
Science revolution is being driven by new, digital tools for scientific collaboration, 
experiments and analysis and which make scientific knowledge more easily 
accessible by professionals and the general public, anywhere, at any time.” [4] 


Michael Nielsen “The idea that scientific knowledge of all kinds should be openly shared as early as 
is practical in the discovery process” [5] 
Center of Open Science “Openness and reproducibility are core scientific values because science is a 


distributed, non-hierarchical culture for accumulating knowledge. No individual is 
the arbiter of truth. Knowledge accumulates by sharing information and 
independently reproducing results.” [6] 


At Wiley, the researcher is our “North Star” as explained by Judy Verses (Executive Vice President, Wiley) 
in her keynote talk at the APE2019 conference in Berlin, Germany [7]. This means that we put researchers 
at the heart of our research publishing and educational services. We listen to the research communities we 
serve and — by tailoring open research initiatives to the needs of researchers in particular disciplines — we 
support their open research aspirations. Adopting open practices, but phasing their implementation to suit 
different communities, is our focus. We organize our work in five key areas: open access, open practices, 
open collaboration, open recognition and reward, and of course, open data [8]. 


“Open data” is an often-used term for sharing data, and is perhaps made more meaningful by the term 
FAIR (Findable, Accessible, Interoperable and Reusable) [9]. After open access, “open data” (or better: FAIR 
data) is probably one of the most important elements of open research [10]. FAIR data have the potential 
to revolutionize the way research is done and communicated and we are seeing benefits in research 
discoveries as a result [11]. Open research initiatives, like open data, bring many benefits including 
increased transparency as well as, potentially, enhanced reproducibility and amplified impact [12]. Funders 
and institutions recognize this and are increasingly requiring researchers to share data [13]. 


However, given the scale and variety of data, the complexity of how best to share data, the need for new 
practices and habits by research communities, and the need for technology and infrastructure to support 
data sharing, it is clear that collaboration across all stakeholders is key. This is a challenge we all must 
embrace, if we are going to make progress. 


To reflect our commitment to open research and to support researchers in sharing their data, Wiley 
recently updated its data sharing and citation policies [14]. In the rest of this article, we will share the 
approach we took with our data policies, how this fits with approaches taken by other publishers, and how 
this helps Wiley begin to achieve the goals of FAIR data. 
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2. RESEARCH DATA SHARING POLICY AT WILEY 


At Wiley, we are making open research not just the future of research and research communication, but 
the here and now. We have four policy-level requirements for data sharing, adopted across our portfolio 
of journals [15]. 


1). “Encourages data sharing” is our entry-level policy to encourage data sharing. It enables journals 
serving researchers in communities where data sharing is not common to start their journey toward 
data sharing. There are no enforced requirements. 

2). “Expects data sharing” is a policy for journals that require from every author a data availability 
statement to confirm presence or absence of shared data, and a data citation. It is equivalent to the 
Transparency and Openness Promotion (TOP) level 1 guidelines [16]. 

3). “Mandates data sharing” is a policy for journals that require a data availability statement, a data 
citation, and sharing of data (It is equivalent to TOP level 2 [16)). 

4). “Mandates data sharing and peer reviews data” is a policy for journals that take the additional step 
of peer reviewing data (It is the equivalent to TOP level 3 [16]). 


Of course, we recognize that the process of adopting open research practices can be challenging and 
requires cultural change as emphasized by Henriikka Mustajoki (Head of Development, Federation of 
Finnish Learned Societies) [17]. Our four policy levels give flexibility so that journals can adopt policies 
that are right for their research communities. 


Tiered policies like these adopted by major publishers enable journals to adapt to the communities they 
serve [18]. The Wiley data sharing policies are shown in Table 2, which maps each against the Transparency 
Openness Promotion (TOP) guidelines [16] that are used by publishers and funders to increase transparency. 
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Table 2. Four data-sharing policies adopted at Wiley and their features. 


Date evailahility Data having Dare having Example Wiley The TOP Guideline 
statement b hared? been peer . | Level 
i een share journals eve 
published? reviewed" 
Encourages Data Optional Optional Optional Not TOP compli- 
Sharing ant, i.e., “Level 0”) 
Expects Data Required Optional Optional British Journal of TOP Level 1 
Sharing Social Psychology 
Mandates Data Required Required Optional Ecology and TOP Level 2 
Sharing Evolution 
Mandates Data Required Required Required Geoscience Data TOP Level 3 
Sharing and Peer Journal 
Reviews Data American Journal of 


Political Science 


Note: a, A data availability statement confirms the presence or absence of shared data. b, Links to data in data availability state- 
ments are checked to ensure they link to the data that the authors intended. If data have been stored in a data repository, the data 
availability statement includes a permanent link to the data. Shared data are also cited. c, Quality and/or replicability of linked 
data are peer reviewed. Depending on the journal, this may be to peer review the quality of the data by ensuring that the results 
in the paper and the data in the repository align (for example, sample sizes and variables match), or it may be to peer review the 
replicability of the data to ensure that the claims presented in the journal article are valid and can be reproduced. 


3. THE RESEARCH DATA SHARING LANDSCAPE 


Many publishers are adopting data sharing policies either encouraging or requiring researchers to share 
their underlying data [18]. These developments are going hand-in-hand with requirements from institutions 
and funders [13]. The characteristic features of data policies from major publishers can be compared with 
how they map to TOP guidelines [17]. However, while publisher data policies have common elements 
there is a recognized need for further standardization [19]. 


With the adoption of data sharing policies comes the possibility of evaluating the impact of these policies. 
Are researchers compliant? How are data shared? Findings from a recent analysis suggest that the majority 
of researchers share data within a published article (rather than via a repository) [20] but more research is 
needed to understand the issues researchers face in sharing their data. 


4. UNDERSTANDING RESEARCHER NEEDS 


The 2016 Wiley Open Science survey, built on earlier work by Wiley in 2014, gathered opinions on 
data-sharing from over 4,600 researchers worldwide [21]. It identified researchers’ motivations to share data 
(Figure 1), as well as what they find most challenging about data sharing. Wiley is continuing to collect 
data on how researchers across all disciplines approach open access, open data, peer review and 
collaboration, and will report the new data in 2019. 
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Researchers sharing data by region sses... (e) 
Data accessibility trends =rrssrsesrserrsersesssensuense evsccvee fe) 
spent a large amount used other checked another 
of time in making their researchers’ publicly paper's source data 
data reproducible available data 
Top 4 researcher motivations .......s..sssssesssesssesresenssee O) 
for sharing data 
Increase the Public benefit Transparency Journal 
impact and visibility and reuse requirement 


of my research 


Top 4 reasons why researchers are hesitant 
to share their data 


@ 50% - intellectual property or 2] 31% - Ethical concerns 
confidentiality issues 


© 23% - | am concerned about (4) 22% - | am concerned that my 
misinterpretation or misuse of research will be scooped 
my research 


Figure 1. Selected insights from Wiley Open Science survey, 2016. The whole infographic with many more 
insights is described in detail by Wiley [21] and is available on Figshare [22]. 


5. PROMOTING DATA SHARING 


In November 2018, during International Data Week [23] we began a campaign to implement the Wiley 
“Expects Data” data sharing policy more broadly. Our goal was to step up the support we offered to 
researchers who want or need to share their data, by transitioning journals from our “Encourages Data” 
data sharing policy to “Expects Data” [14]. 
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Our first step was to create a toolkit that would brief publishing colleagues, so they could effectively 
liaise with editors of journals, and then — together — to implement the requirements of the “Expects Data” 
policy, namely by including data availability statements and data citations in every article. The data sharing 
team provided everything that journals would need, including support for authors in the form of template 
data availability statements, instructions for how to cite the data they are sharing, and advice on finding 
appropriate repositories at which to share their data [15]. We began our implementation plan by selecting 
journals serving disciplines that were most ready for data sharing, and introduced our new Expects Data 
policy to those journals first. 


At that time, November 2018, more than 1,500 journals had the entry-level “Encourages Data” data 
sharing policy, and had no specific requirements for data sharing by researchers. We also published a much 
smaller number of journals (more than 20) that had adopted an earlier version of our “Expects Data” policy, 
which emphasized the benefits of sharing data to researchers, but that still had no specific requirements 
for data sharing. Alongside this, we published a similarly small number of journals with “Mandates Data” 
policy (about 20), among which are the leading journals from the Wiley evolutionary biology portfolio. 


Since 2018, we have made significant progress at Wiley. By April 2019, 90 journals have adopted and 
implemented our “Expects Data” policy and 70 journals have adopted “Mandates Data” policy. Examples 
of these journals are shown in Table 3 below. Each of these journals now requires data availability statements 
in every article it publishes, as well as data citations. To make the whole process straightforward for research 
authors, we created a series of standard templates to complete their data availability statements, shared in 
Table 4. 


Table 3. Ten example journals that have adopted the Wiley Expects Data policy. 


Journal title ISSN Homepage 
Acta Neurologica Scandinavica 1600-0404 https://onlinelibrary.wiley.com/journal/1 6000404 
Applied Stochastic Models in Business and Industry 1526-4025 https://onlinelibrary.wiley.com/journal/15264025 
Brain and Behavior 2162-3279 https://onlinelibrary.wiley.com/journal/2 1579032 
Chemical Biology and Drug Design 1747-0285 https://onlinelibrary.wiley.com/journal/17470285 
Clinical Endocrinology 1365-2265 https://onlinelibrary.wiley.com/journal/13652265 
Clinical Genetics 1399-0004 https://onlinelibrary.wiley.com/journal/1 3990004 
Environmetrics 1099-095xX https://onlinelibrary.wiley.com/journal/1099095x 
Immunity, Inflammation and Disease 2050-4527 https://onlinelibrary.wiley.com/journal/20504527 
Research Synthesis Methods 1759-2887 https://onlinelibrary.wiley.com/journal/1 7592887 
Pharmaceutical Statistics 1539-1612 https://onlinelibrary.wiley.com/journal/15391612 
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Table 4. Template data availability statements from [15]. 


Availability of data 


Template for data availability statement 


Data openly available in a public 
repository that issues data sets 
with DOIs 

Data openly available in a public 
repository that does not issue 
DOls 

Data derived from public domain 
resources 


Embargo on data due to 
commercial restrictions 


Data available on request due to 
privacy/ethical restrictions 


Data subject to third party 
restrictions 


The data that support the findings of this study are openly available in 
[repository name e.g., “figshare”] at http://doi.org/[doi], reference number 
[reference number]. 

The data that support the findings of this study are openly available in 
[repository name] at [URL], reference number [reference number]. 


The data that support the findings of this study are available in [repository 
name] at [URL/DOI], reference number [reference number]. These data were 
derived from the following resources available in the public domain: [list 
resources and URLs] 

The data that support the findings will be available in [repository name] at 
[URL/DOI link] following an embargo from the date of publication to allow 
for commercialization of research findings. 

The data that support the findings of this study are available on request from 
the corresponding author. The data are not publicly available due to privacy 
or ethical restrictions. 

The data that support the findings of this study are available from [third 
party]. Restrictions apply to the availability of these data, which were used 


under license for this study. Data are available [from the authors/at URL] with 
the permission of [third party]. 

The data that support the findings of this study are available from the 
corresponding author upon reasonable request. 

Data sharing is not applicable to this article as no new data were created or 
analyzed in this study. 


Data available on request from 
the authors 

Data sharing not applicable — no 
new data generated 


We also publish several journals — including EMBO Reports, The EMBO Journal, and EMBO Molecular 
Medicine — that have adopted our highest data policy of “Mandates and Peer Reviews Data”, setting the 
standard for data transparency (and also data citation, discussed in the section that follows). Beyond our 
data sharing policy, we partner with repositories like Figshare and Dryad to make it easier for authors to 
share data in approved repositories. We develop standards and guidance that enable researchers to share 
and cite their research data more readily [24, 25]. We adopt and encourage the use of Center for Open 
Science badges, and over 30 journals use these to recognize and celebrate authors who share data. We are 
launching an Open Science Ambassador Program in China, and Open Data contribution and sharing will 
be important components. 


6. CITING DATA 


Wiley also endorses the FORCE11 Joint Declaration of Data Citation Principles [26], a set of guiding 
principles for data within scholarly literature, another data set, or any other research object. We recommend 
the format for data citation proposed in this Joint Declaration, and that data held within institutional, 
subject-focused, or more general data repositories should be cited. At the same time, we do not intend to 
replace community standards such as in-line citation of GenBank accession codes, instead we hope to 
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supplement those with formal data citations. This is one way to begin to enable researchers who share data 
to be recognized in the same way that researchers are recognized when they collect citations to their 
research articles. Data citation like this is not new to Wiley policies. But the emphasis on data citation 
within the new Wiley data sharing policies is new and is in-line with industry standards and initiatives to 
recognize data as a primary research object. 


7. WORKING TOWARD FAIR DATA 


At Wiley, we believe that the introduction of data sharing policies is the first step toward supporting and 
embracing the FAIR guiding principles [9]. While our policies actively support data sharing (“Expects Data 
Sharing”) or require data sharing (“Mandates Data Sharing”), the task of making shared data actually FAIR 
remains with researchers. For many this will be a new responsibility, and it can present some challenges. 
We have begun work to help overcome those challenges. 


For example, research authors who select the first of our template data availability statements (“Data 
openly available in a public repository that issues data sets with DOIs”) [15] are indicating that their data 
is “F” (Findable; with a unique and persistent identifier, the DOI) and “A” (Accessible; retrievable by that 
identifier). Journals that adopt our level 4 policy “Mandates Data Sharing and Peer Reviews Data” conduct 
peer review on data submitted alongside journal articles, and by doing that, help research authors make 
their data ready to be “R” (Reusable). 


Each of these steps moves us closer to the goal of turning open data into FAIR data, although often the 
“I” of FAIR (Interoperable) remains a challenge. The following section shares examples of work we are 
leading or contributing to at Wiley that take us even closer to that goal. Collaboration from all parties — 
researchers, funders, institutions, policy-makers, infrastructure providers (like repositories), and publishers 
— is vital to make FAIR data a reality. 


8. EXAMPLES OF PROGRESS TOWARD FAIR DATA 


American Geophysical Union and Enabling FAIR Data. The American Geophysical Union (AGU) 
together with Wiley and other partners (including repositories and supporting organizations) have an 
on-going project to enable FAIR data across the earth and space sciences, sensibly called Enabling FAIR 
Data [27]. This builds on the work of the Coalition on Publishing Data in the Earth and Space Sciences 
(COPDESS) [28]. Large and complex data sets are common in the earth and space sciences, which makes 
this initiative particularly welcome. 
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Remarkable practice at GeoScience Data Journal and American Journal of Political Science. Journals 
that “Mandate Data Sharing and Peer Review Data,” for example, the GeoScience Data Journal published 
by Wiley [29] and the American Journal of Political Science published by Wiley for the Midwest Political 
Science Association [30], have already adopted remarkable practices toward sharing FAIR-compliant data, 
and are to be applauded. 


SourceData at EMBO Press. The team at EMBO Press, for which Wiley provides publishing services, 
introduced SourceData in 2018 [31]. SourceData provides a significant step on the road to FAIR data: It 
makes data findable, accessible, interconnected and downloadable. 


“Next” journals at Wiley: Genetics & Genomics Next and Neuroscience Next. Wiley’s new Next 
journals, Genetics & Genomics Next [32] and Neuroscience Next [33] support FAIRsharing.org [2] and 
endorse the FAIR data principles in their own research disciplines. 


Research Data Alliance and standard data sharing policies. Wiley is a member of the community 
that comes together as the Research Data Alliance (RDA). RDA is creating the social and technical 
infrastructure that researchers need to share data successfully. For example, the RDA's Data Policy 
Standardization Interest Group is creating a unified approach to setting data policies, by providing identifying 
standard requirements for data sharing, as well as how these can be put together into a robust data 
policy [34]. 


9. CONCLUSIONS: NEXT STEPS 


At Wiley, we believe that open research is not just the future of research communications; it is the here 
and now [8]. Publishers are fundamentally service providers for researchers, whether those researchers are 
acting as authors, peer reviewers, editors or readers. Our careful implementation of open research practices, 
including data-sharing policies and open data badges, is intended to help researchers adopt new practices 
and to benefit from extra impact. We are excited about seeing the results of this work in terms of published 
data availability statements, and data citations in future. Looking further ahead we intend to measure the 
success of our “Expects Data” policy implementation, and to measure publication of data availability 
statements and data citations. With this information, we will be able to celebrate adoption of new practices 
by the research communities we work with and serve, and showcase researchers from those communities 
leading in open research. 
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